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Abstract. Lookahead search is perhaps the most natural and widely used game playing 
strategy. Given the practical importance of the method, the aim of this paper is to provide a 
theoretical performance examination of lookahead search in a wide variety of applications. 
To determine a strategy play using lookahead search, each agent predicts multiple levels 
^vq of possible re-actions to her move (via the use of a search tree), and then chooses the play 

_^ that optimizes her future payoff accounting for these re-actions. There are several choices of 

' ^ optimization function the agents can choose, where the most appropriate choice of function will 

['t I depend on the specifics of the actual game - we illustrate this in our examples. Furthermore, 

the type of search tree chosen by computationally-constrained agent can vary. We focus on the 
I case where agents can evaluate only a bounded number, k, of moves into the future. That is, 

we use depth k search trees and call this approach k-lookahead search. 
'j. ^ ' We apply our method in five well-known settings: AdWord auctions; industrial organization 

^ ^ (Cournot's model); congestion games; valid-utility games and basic-utility games; cost-sharing 

\m/ network design games. We consider two questions. First, what is the expected social quality 

of outcome when agents apply lookahead search? Second, what interactive behaviours can be 
, ^ , exhibited when players use lookahead search? 

Myopic game playing (whose corresponding equilibria are Nash equilibria) , where each player 
^-H can only foresee the immediate effect of her own actions, is the special case of 1-lookahead 

^ search. Thus, for the first question, it is natural to ask whether social outcomes improve when 

players use more foresight than in myopic behaviour. The answer depends on the game played: 

(i) In Adword auctions (or generalized second-price auctions) , we show that 2- lookahead game 
playing results in outcomes that are always optimal to within a constant factor; in contrast, 
myopic game play can produce arbitrarily poor equilibrium outcomes. 

(ii) For the Cournot game, applying 2-lookahead leads to a 12.5% increase in output and a 
CSJ 5.5% increase in social surplus compared with myopic competition. Similar bounds arise as 
^""l the length k of foresight increases. 

^ (iii) For congestion games, as with myopic game playing, lookahead search leads to constant 

factor qualitative guarantees. 

(iv) For basic-utility games, on the other hand, whilst myopic game playing always leads to 
constant factor approximations, additional foresight can lead to arbitrarily bad solutions! 

(v) In a simple Shapley network design game, qualitative guarantees improve with the length 
of foresight. 

Regarding the second question, a variety of interesting game playing characteristics also 
arise with lookahead search. Stackelberg leader-follower behaviours can be induced when the 
players have asymmetric computational power. For example, Stackelberg equilibria can be pro- 
duced in the Cournot game. Lookahead search can also generate "uncoordinated" cooperative 
behaviour! An example of this is shown for the Shapely network design game. 
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1. Introduction 

Our goal here is not to prescribe how games should be played. Rather, we wish to analyse how 
games actually are played. To wit we consider the strategy of lookahead search, described by Pearl 
|58| in in his classical book on heuristic search as being used by "almost all game-playing programs" . 
To understand the lookahead method and the reasons for its ubiquity in practice, consider an agent 
trying to decide upon a move in a game. Essentially, her task is to evaluate each of her possible moves 
(and then select the best one) . Equivalently, if she know the values of each child node in the game tree 
then she can calculate the value of the current node. However, the values of the child nodes may also 
be unknown! Recall two prominent ways to deal with this. Firstly, crude estimates based upon local 
information could be used to assign values to the children; this is approach taken by best response 
dynamics. Secondly, the values of the children can be determined recursively by finding the values 
of the grandchildren. At its computation extreme, this latter approach in a finite game is Zermelo's 
algorithm - assign values to the leaf node^ of the game tree and apply backwards induction to find 
the value of the current node. 

Both these approaches are special cases of lookahead search: choose a local search tree T rooted 
at the current node in the game tree; valuations (or estimates thereof) are given to leaf nodes of T; 
valuations for internal tree nodes are then derived using the values of a node's immediate descendants 
via backwards induction; a move is then selected corresponding to the value assigned the root. For 
best response dynamics the search tree is simply the star graph consisting of the root node and its 
children. With unbounded computational power, the search tree becomes the complete (remaining) 
game tree used by Zermelo's algorithm. 

We remark that the actual shape of the search tree T is chosen dynamically. For example, if local 
information is sufficient to provide a reliable estimate for a current leaf node w then there is no need 
to grow T beyond w. If not, longer branches rooted at w need to be added to T. Thus, despite our 
description in terms of "backwards induction" , lookahead search is a very forward looking procedure. 
Subject to our computational abilities, we search further forward only if we think it will help evaluate 
a game node. Indeed, in our opinion, it is this forward looking aspect that makes lookahead search 
such a natural method, especially for humans and for dynamic (or repeated) gamesj^ 

Interestingly, the lookahead method was formally proposed as long ago as 1950 by Shannon [68], 
who considered it a practical way for machines to tackle complex problems that require "general 
principles, something of the nature of judgement, and considerable trial and error, rather than a 
strict, unalterable computing process". To illustrate the method. Shannon described in detail how it 
could be applied by a computer to play chess. The choice of chess as an example is not a surprise: 
as described the lookahead approach is particularly suited to game-playing. It should be emphasised 
again, however, that this approach is natural for all computationally constrained agents, not just for 
computers. Lookahead search is an instinctive strategic method utilised by human beings as well. 
For example. Shannon's work was in part inspired by De Groot's influential psychology thesis [31] on 
human chess players. De Groot found that all players (of whatever standard) used essentially the same 
thought process - one based upon a lookahead heuristic. Stronger players were better at evaluating 
positions and at deciding how to grow (prune or extend) the search tree but the underlying approach 
was always the same. 

Despite its widespread application, there has been little theoretical examination of the consequences 
of decision making determined by the use of local search trees. The goal of this paper is to begin 
such a theoretical analysis. Specifically, what are the quantitative outcomes and dynamics in various 
games when players use lookahead search? 



^Often the values of the leaf nodes will be true values rather than estimates, for example when they correspond to 
end positions in a game. 

In contrast, strategies that are prescribed by axiomatic principles, equilibrium constraints, or notions of regret are 
much less natural for dynamic game players. 
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1.1. LOOKAHEAD SEARCH: ThE MODEL. 

Having given an informal presentation, let's now formally describe the lookahead method. Here we 
consider games with sequential moves that have complete information. These assumptions will help 
simplify some of the underlying issues, but the lookahead approach can easily be applied to games 
without these properties. 

We have a strategic game G{T',S,{ai : i G V}). Here V is the set of n players. Si is the set of 
possible strategies for i G 5 = [Si x S2 ■ ■ ■ x Sn) is the strategy space, and Oj : 5 — )• i? is the payoff 
function for player i £ V. A state s = (si, S2, • • . , Sn) is a vector of strategies Si £ Si for each player 
i £ V. 

Suppose player i G 7^ is about to decide upon a move. Recall, with lookahead search, she wishes to 
assign a value to her current state node s G 5 that corresponds to the highest value of a child node. 
To do this she selects a search tree Tj over the set of states of the game rooted at s. For each leaf 
node / in Tj, player i then assigns a valuation II jf = aj{l) for each player j. Valuations for internal 
nodes in Tj are then calculated by induction as follows: if player p is destined to move at game node 
V then his valuation of the node is given by 

^p,v = max [vp^jj + Hp^fi] . 

u^C{v) 

Here, C{v) denotes the set of children of w in Tj, and rp^y is some additional payoff received by player 
p at node v. Should p choose the child u* G C{v) then assume any non-moving player j ^ p places 
a value of Hj^v = T~j,v + ^j,u* on node v. Then given values for children of the root node s of Tj, 
player i is thus able to compute the lookahead payoff Hj^s which she uses to select a move to play at 
s. [The method is defined in an analogous manner if players seek to minimise rather than maximise 
their "payoffs".] 

After i has moved, suppose player j is then called upon to move. He applies the same procedure 
but on a local search tree Tj rooted at the new game node. Note that fs move may not be the move 
anticipated by i in her analysis. For example, suppose all the players use 2-lookahead search. Then 
player i calculates on the basis that player j will use a 1-lookahead search tree when he moves - 
because for computational purposes it is necessary that Tj C Tj. But when he moves player j actually 
uses the 2-lookahead search tree Tj and this tree goes beyond the limits of Tj. 

1.2. Lookahead Search: The Practicalities. 

Observe that there is still a great deal of flexibility in how the players implement the model. This 
versatility, we would argue, is a major strength (and another reason underlying its ubiquity) and not 



a weakness of the method. For example, it accords well with Simon's belief, discussed in Section 1.4 
that behaviours should be adaptable. We now give some examples of this adaptability and highlight 
those aspects that we analyse in this paper. 

• Dynamic Search Trees. Recall that search trees may be constructed dynamically. Thus, the 
exact shape of the search tree utilized will be heavily influenced by the current game node, and the 
experience and learning abilities of the players. Whilst clearly important in determining gameplay and 
outcomes, these influences are a distraction from our focal point, namely, computation and dynamics 
in games in which players use lookahead search strategies. Therefore, we will simply assume here that 
each T is a breadth first search tree of depth ki. Implicitly, ki is dependent on the computational 
facilities of player i. 

• Evaluation Functions. Different players may evaluate leaf nodes in different ways. To evaluate 
internal nodes, as described above, we make the standard assumption that they use a max (or min) 
function. This need not be the case. For example, a risk-averse player may give a higher value to a 
node (that it does not own) with many high value children than to a node with few high value children 
~ we do not consider such players here. 

• Internal Rewards or Not: Path Model vs Leaf Model. We distinguish between two broad 
classes of game that fit in this framework but are conceptually quite different. In the first category, 
payoffs are determined only by outcomes at the end of game. Valuations at leaf nodes in the local 
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search trees are then just esthnates of the what the final outcome wih be if the game reaches that 
point. Clearly chess falls into this category. In the second category, payoffs can be accumulated over 
time - thus different paths with the same endpoints may give different payoffs to each player. Repeated 
games, such as industrial games over multiple time periods, can be modelled as a single game in this 
category. The first category is modelled by setting all internal rewards Vp^y = 0. Thus what matters 
in decision making is simply the initial (estimated) valuations a player puts on the leaf nodes. We 
call this the leaf (payoff) model as an agent then strives to reach a leaf of Tj with as high a value as 
possible. The second category arises when the internal rewards, rp^y, can be non-zero. Each agent 
then wishes to traverse paths that allow for high rewards along the way. More specifically, in this 
model, called the path (payoff) model, the internal reward is Vp^y = ap{v). 

• Order of Moves: Worst-Case vs Average-Case. In multiplayer games, the order in which 
the players move may not be fixed. This adds additional complexity to the decision making process, 
as the local search tree will change depending upon the order in which players move. Here, we will 
examine two natural approaches a player may use in this situation: worst case lookahead and average 
case lookahead. In the former situation, when making a move, a risk-averse player will assume that 
the subsequent moves are made by different players chosen by an adversary to minimize that player's 
payoff. In the latter case, the player will assume that each subsequent move is made by a player chosen 
uniformly at random; we allow players to make consecutive moves. In both cases, to implement the 
method the player must perform calculations for multiple search trees. This is necessary to either find 
the worst-case or perform expectation calculations. 

1.3. Techniques and Results. 

We want to understand the social quality of outcomes that arise when computationally-bounded agents 
use /c-lookahead search to optimise their expected or worst-case payoff over the next k moves. Two 
natural ways we do this are via equilibria and via the study of game dynamics. To explain these 
approaches, consider the following definition. Given a lookahead payoff function, Ilj^g, a lookahead 
best-response move for player i, at a state s S 5, is a strategy Si maximising her lookahead payoff, 
that is, Vs'j G Sii Ilj^g > nj (g_^ ,,/■). [A move s'^ for player i, at a state s S 5, is lookahead improving 
if Ilj^j < nj (g_. ,,/■).] A lookahead equilibrium is then a collection of strategies such that each player is 
playing her lookahead best-response move for that collection of strategies. Our focus here is on pure 
strategies. Then, given a social value for each state, the coordination ratio (or price of anarchy) of 
lookahead equilibria is the worst possible ratio between the social value of a lookahead equilibrium and 
the optimal global social value. 

To analyse the dynamics of lookahead best-response moves, we examine the expected social value 
of states on polynomial length random walks on the lookahead state graph, Q. This graph has a node 
for each state s € 5 and an edge from s to a state i with a label i € V ii the only difference between 
s and i is that player i changes strategy from Sj to t,, where ti is the lookahead best response move 
at s. The coordination ratio of lookahead dynamics is the worst possible ratio between the expected 
social value of states on a polynomially long random walk on Q and the optimal global social value. 

For practical reasons, we are usually more interested in the dynamics of lookahead best-response 
moves than in equilibria. For example, as with other equilibrium concepts, lookahead best-response 
moves may not lead to lookahead equilibria. Indeed, such equilibria may not even exist. Typically, 
though, the methods used to bound the coordination ratio for /c-lookahead equilibria can be combined 
with other techniques to bound the coordination ratio for A;-lookahead dynamics. We show how to 
do this for congestions games in Section [4| see also Goemans et al. [30] for several examples with 
respect to 1-lookahead dynamics. Consequently, for both simplicity and brevity, most of the results 
we give here concern the coordination ratio for lookahead equilibria. We are particularly interested in 
discovering when lookahead equilibria guarantee good social solutions, and how outcomes vary with 
different levels of foresight (k). We perform our analyses for an assortment of games including an 
AdWord auction game, the Cournot game, congestion games, valid-utility games, and a cost-sharing 
network design game. 
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We begin, in Section [2| by considering strategic bidding in an AdWord generalised second-price 
auction, and studying the social values of the allocations in the resulting equilibria. In particular, we 
show that 2-lookahead game playing results in the optimal outcome or a constant-factor approximate 
outcome under the leaf and path models, respectively. This is in contrast to 1-lookahead (myopic) 
game playing which can result in arbitrarily poor equilibrium outcomes, and shows that more forward- 
thinking bidders would produce efficient outcomes. 

Second, in Section [3| we examine the Cournot duopoly game. Here two firms compete in producing 
a good consumed by a set of buyers via the choice of production quantities. We study equilibria of 
these simple games resulting from A;-lookahead search. The equilibria of these simple games for myopic 
game playing, A; = 1, is well- understood. For k > 1, however, firms produce over 10% more than if 
they were competing myopically; this is better for society as it leads to around a 5% increase in social 
surplus. Surprisingly, the optimal level of foresight for society is = 2. Furthermore, we show that 
Stackelberg behaviours arise as a special case of lookahead search where the firms have asymmetric 
computational abilities. 

Third, in Section [4j we examine congestion games with linear latency functions, and study the 
average of delay of players in those games. We show that 2-lookahead game playing results in constant- 
factor approximate solutions. In particular, the coordination ratio of lookahead dynamics is a constant. 
These guarantees are similar to those obtained via 1-lookahead. 

Fourth, in Section 4.1 we consider two classes of resource sharing games, known as valid-utility 
and basic-utility games. For both of these games, we show that lookahead game playing may result 
in very poor solutions. For valid- utility games, we show A;- lookahead can give a coordination ratio for 
lookahead dynamics of ©(-^/n). Myopic game play can also give very poor solutions [30j, but additional 
foresight does not significantly improve outcomes in the worst case. For basic-utility games, however, 
myopic game dynamics give a constant coordination ratio [30] whereas we show that 2-lookahead game 
playing may result in o(l)-approximate social welfare with the leaf model. Thus, additional foresight 
in games need not lead to better outcomes, as is traditionally assumed in decision theory. 

Finally, in Section [5} we present a simple example of a cost-sharing network design game that 
illustrates how the use of lookahead search can encourage cooperative behaviour (and better outcomes) 
without a coordination mechanism. 

Observe that our results show that lookahead search has different effects depending upon the game. 
It would be interested to study further which game structures lead to more beneficial outcomes when 
longer foresight is used, and which game structures lead to more detrimental outcomes. 



1.4. Background and Related Work. 

This work is best viewed within the setting of bounded rationality pioneered by Herb Simon. In 
Rational Choice Theory a rational agent (or economic man) makes decisions via utility maximisation. 
Whilst the non-existence of economic man is not in doubt, rationality remains a central assumption in 
economic thought. This is typically justified using an as if as expounded by Friedman [26J: whether 
people are actually rationality or not is unimportant provided their actions can be viewed in a way 
that is consistent with rational decision making - that is, provided agents act as if they are rationaljj 
Friedman concluded that a model should be judged by it predictive value rather than by the realism 
of its assumptions. On this scale rationality often (but not always) does very well. 

However, motivated by considerations of computational power and predictive ability, Simon |69j 
argued that "the task is to replace the global rationality of economic man with a kind of rational 
behaviour that is compatible with the access to information and the computational capacities that are 
actually possessed by organisms, including man, in the kinds of environments in which such organisms 
exist". He argued that, instead of optimising, agents apply heuristics in decision making. An example 
of this being the satis ficing heuristic: agents search for feasible solutions, stopping when then discover 



■^For example, a consumer whose purchasing strategy allocates fixed proportions of her budget to specific 
goods (regardless of price levels) can be viewed as rational consumer with a Cobb-Douglas utility function! 
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an outcome that achieves an aspired level of satisfactiorj^ We remark that the use of a search phase 
provides a fundamental distinction between rational and boundedly rational agents. For rational 
agents the search is irrelevant as they will anyway make an optimal choice given the constraints of 
the problem. For agents of bounded rationality the form of the search can heavily influence decision 
making. 

Interestingly, De Groot's work on chess players also heavily influenced Simon's general thinking on 
cognitive science]^ This is exemplified in his famous book with Newell on human problem solving |57j . 
where humans are viewed as information processing systems. 

The label bounded rationality is currently used in a number of disparate areas some of which actually 
go against the main thrust of Simon's original ideas; see Selten |64j and Rubenstein [59] for some 
discussion on this point. Two schools of thought developed by psychologists, experimental economists, 
and behavioural economists are, however, well worth mentioning here. First, the Heuristics and Biases 
program espoused by Kahneman and Tversky and, second, the Fast and Frugal Heuristics program 
espoused by Gigerenzer. Whilst both programs agree that humans routinely use simple heuristics in 
decision making, their philosophical outlooks are very different. The former program primarily looks 
for outcomes (caused by the use of heuristics) in violation of subjective excepted utility theory, and 
views such biases as a sign of irrationality likely to lead to poor decision making. In contrast, the 
latter program views the use of heuristics as natural and, in principle, entirely compatible with good 
decision making. For example, simple heuristics may be more robust to environmental changes and 
actually outperform methods based upon subjective excepted utility maximisation. As with the work 
of Simon, for the fast and frugal heuristics school, the actual quality of an heuristic is assumed to be 
dependent upon the search - how to search and when to stop searching - and the choice of decision rule 
after the search is terminated. Clearly, the lookahead heuristic can be viewed in this light: there is a 
search (via a local search tree), there is a "stopping rule" (determined, for example, by computational 
constraints and by the expertise of the player) , and there is a decision rule (backwards induction) . 

The value of lookahead search in decision-making has been examined by the artificial intelligence 
community [55] ; for examples in effective diagnostics and real-time planning see [ID] and |63j . Looka- 
head search is also related to the sequential thinking framework in game theory [521 173j . However, 
compared to these works and the research carried out by the two schools above, our focus is more 
theoretical and less experimental and psychological. Specifically, we desire quantitative performance 
guarantees for our heuristics. 

Our research is also related to works on the price of anarchy in a game, and convergence of game 
dynamics to approximately optimal solutions [ 50t i30j and to sink equilibria |3L)l ?IT\ . Numerous articles 
study the convergence rate of best-response dynamics to approximately optimal solutions [T^l [Ml HI [S] . 
For example, polynomial-time bounds has been proven for the speed of convergence to approximately 
optimal solutions for approximate Nash dynamics in a large class of potential games [1], and for 
learning-based regret-minimisation dynamics for valid- utility games |9]. Our work differs from all 
the above as none of them capture lookahead dynamics. In another line of work, convergence of 
best-response dynamics to (approximate) equilibria and the complexity of game dynamics and sink 
equilibria have been studied [22l [H [111 E21 [HI [IH] , but our paper does not focus on these types of 
dynamics or convergence to equilibria. 

Motivated by concerns of stability, convergence, and predictability of equilibria and game dynamics, 
various equilibrium concepts other than Nash equilibria have been studied in the economics literature. 
Among them are correlated equilibria [2J, stable equilibria [S], stochastic adjustment models [38] , 
strategy subsets closed under rational behaviour (CURB set) |6], iterative elimination of dominated 
strategies, the set of undominated strategies, etc. Convergence and strategic stability of equilibria in 
evolutionary game theory is also an important subject of study. Many other game-theoretic models 
have been proposed to capture the self-interested behaviour of agents. As well as best-response 



Over time, and depending upon what is found in the search, this aspiration level may be changed. 
^In fact, Simon sent his student George Baylor to help translate De Groot's work into English. 
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dynamics, noisy best-response dynamics [20^ \79\ [5T] , where players occasionally make mistakes, and 
simultaneous Nash dynamics where all players change their strategies simultaneously, are both 
well-studied. In many other models the effect of learning algorithms [50] is examined, for example, 
regret minimisation dynamics [23 [321 E3 UHl [3 El [H] and fictitious play [TT] . In most of these studies 
the most important factor is the stability of equilibria, and not measurements of the social value of 
equilibria. Furthermore, most of them are motivated by theoretical game theoretic concepts rather 
than practical game-playing, and none of the above works consider lookahead search. 

2. Generalised Second-Price Auctions 

For our first example, we apply the lookahead model to generalised second-price (GSP) auctions. 
Our main results are that outcomes are provably good when agents use additional foresight; in contrast, 
myopic behaviour can produce very poor outcomes. 

The auction set-up is as follows. There are T slots with click-through rates ci > C2 > ... > > 0, 
that is, higher indexed slots have lower click-through rates. There are n players bidding for these 
slots, each with a private valuation v^. Each player i makes a bid 6*. Slots are then allocated via a 
generalised second price auction. Denote the jth highest bid in the descending bid sequence by bj, 
with corresponding valuation Vj. The jth best slot, for j < T, is assigned to the jth highest bidder 
who is charged a price equal to The T highest bidders are called the "winners". According to 

the pricing mechanism, if bidder i were to get slot t in the final assignment, then he would get utility 
ul = (w* — bt+i)ct. We denote a player i's utility if he bids 6* by (the other players bids are 

implicit inputs for u^). 

This auction is used in the context of keyword ad auctions (e.g, Google AdWords) for sponsored 
search. Given the continuous nature of bids in the GSP auction, the best response of each bidder i 
for any vector of bids by other bidders corresponds to a range of bid values that will result in the 
same outcome from fs perspective. Among these set of bid values, we focus on a specific bid value 
6*, called the balanced bid [13j. The balanced bid ¥ is a best-response bid that is as high as possible 
such that player i cannot be harmed by a player with a better slot undercutting him, i.e. bidding just 
below him. It is easy to calculate that for player i in slot t, 1 < t < T, the only balanced bid is 

6^ = (1-— K + — 6m. 

Ct-l Ct-l 

An important property of balanced bidding is that each "losing" player i (one not assigned a slot) 
should bid truthfully, that is ¥ = v^. To see this add dummy slots with q = if t > T. The player 
who wins the top slot should also bid truthfully under balanced bidding. Balanced bidding is the most 
commonly used bidding strategy 113\ 148]. For some intuition behind this, note that balanced bidding 
has several desirable properties. For a competitive firm, bidding high obviously increases the chance 
of obtaining a good slot. Within a slot this also has the benefit of pushing up the price a competitor 
pays without affecting the price paid by the firm. On the other hand, bidding high increases the upper 
bound on the price the firm may pay, leading to the possibility that the firm may end up paying a high 
price for one of the less desirable slots. Balanced bidding eliminates the possibility that a change in 
bid from a higher bidder can hurt the firm. (Clearly, it is impossible to obtain such a guarantee with 
respect to a lower bidder.) Thus, balanced bidding provides some of the benefits of high bidding at 
less risk. Balanced bidding naturally converges to Nash equilibria unlike other bidding strategies such 
as altruistic bidding or competitor busting [13j. Moreover, the other bidding strategies would require 
some discretization of players' strategy space in order to analyse the best response dynamics [131 HH]. 
Consequently, balanced bidding is the most natural strategy choice for our analysis. 

For this auction problem, we consider only the leaf model. The leaf model seems more natural 
than the path model for a single auction as players are interested in the final allocation output by the 
auction (there are no intermediary payoffs). We analyse both worst-case and average-case lookahead; 
depending upon the level of risk-aversion of the agents both cases seem natural in auction settings. 
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Let player i's lookahead payoff (or utility) at bid 6* with respect to player j, denoted by u^^{U), be 
player Vs payoff (or utility) after player j makes a best-response move. In the worst-case lookahead 
model, we define player i's lookahead payoff for a vector h of bids as 11 ■ 5 = ?i*(6*) = miuj u^'-'{U). In 

the average-case lookahead model, player z's lookahead payoff Ilj 5 for a bid vector h is Ilj 5 = u^{U) = 
^^jM*-^(6'). Changing strategy from bid V" to bid 6* is a lookahead improving move if lookahead 
utility increases, i.e., u^(U) > u^{¥). We are at a lookahead equilibrium if no player has a lookahead 
improving move. 

It is known that the social welfare of Nash equilibria for myopic game playing can be arbitrarily 
bad [13j unless we disallow over-bidding |46j . Here, we prove the advantage of additional foresight by 
showing that 2-lookahead equilibria have much better social welfare. In particular, we show that all 
such equilibria are optimal in the worst-case lookahead model, and all such equilibria are constant- 
factor approximate solutions in the average-case lookahead model. 

2.1. Worst-Case Lookahead. 

Our proof for the worst-case lookahead model can be seen as a generalisation of the proof of [12J for 
a slightly different model. We start by proving a useful lemma in this context. 

Lemma 2.1. Consider the worst-case lookahead model with the leaf model. Label the players so that 
player i is in slot i, and suppose there is a player t such that < v^^^ . Then player t myopically 
prefers slot t + 1 to slot t. 

Proof. Suppose not. Then, as player t does not myopically prefer slot t -|- 1 we have 

{vt - bt+i)ct > {vt - bt+2)ct+i 
By definition, bt+i = vt+i - ^{vt+i - 64+2)- Plugging this in gives 

[Vt - bt+2)ct+i < [Vt vt+i bt+2 Q < Vt bt+2 ct = [vt - 64+2)^+1 

V Ct Ct J \ Ct Ct J 

Thus we obtain our desired contradiction. Note that the strict inequality above follows directly from 
the fact that < □ 

An equilibrium is output truthful if the slots are assigned to the same bidders as they would be if 
bidders were to bid truthfully. It is easy to verify that an an allocation optimizes solcial welfare if 
and only if it is output truthful. Thus to prove 2-lookahead equilibria are socially optimal it suffices 
to show they are output truthful. 

Theorem 2.2. For GSP auctions, any 2-lookahead equilibrium gives optimal social welfare in the 
worst-case, leaf model. 

Proof. We proceed by contradiction. Consider a non-output-truthful 2-lookahead equilibrium. Again, 
label the players so that the player i is in slot i. Amongst all the winning players, take the one with 
the lowest valuation, Vi. First suppose that vi is not amongst the T highest valuations. Then, there is 
a losing player with a higher value than Vi. But this player is bidding his value, as a result of balanced 
bidding. Consequently, player i's utility must be negative, a contradiction. 

Thus, we may assume that Vi is amongst the T highest valuations; specifically it must have exactly 
the Tth highest valuation. We will show that player i moving into slot T is a lookahead improving 
move. Notice that the lookahead value for player i staying in slot i is at most the myopic value of 
staying in that slot. This follows as the choice of a player two slots below i cannot improve the utility 
of player i (neither in terms of price nor slot position), but only could make it worse. Hence, it suffices 
to show that the lookahead value of changing slots is better than the myopic value of staying in slot i. 



By several applications of Lemma 2.1, we see that player i myopically prefers slot T to slot i. 
However, in moving to slot T, player i will still make a balanced bid. Thus, no other winning player 
may reduce z's utility by undercutting him. Also, no losing player j wants to move to a winning slot as 
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they can only be left with negative utility - since j cannot then be amongst the T highest valuations. 
So moving to slot T is a lookahead improving move for player i. 

If player i were originally in slot T, then the entire argument can be applied with regards to slots 
1 to T — 1. Inductively, we then conclude that in any non-output-truthful equilibrium, there is a 
lookahead improving move, which is a contradiction. This gives us the desired result. □ 

2.2. Average Case Lookahead. 

Next, we consider the average-case lookahead model, and show that the above theorem does not hold 
for this case. 

Theorem 2.3. In GSP auctions, there exist 2-lookahead equilibria that are not output-truthful in the 
average-case, leaf model. 

Proof. Consider the following example with n = T = 4. Let the click-through rates be ci = 35, C2 = 
26, C3 = 25, and C4 = 20. Let the valuations be vi = 82, V2 = 83, = 100, V4 = 93. Starting with the 
highest slot and working to the lowest, let bidder i bid the balanced bid for slot i. It can be verified 
that this turns out to be a non-output-truthful equilibria. □ 

Despite this negative result, 2-lookahead equilibria cannot have arbitrarily bad social welfare. 

Theorem 2.4. In GSP auctions, the coordination ratio of 2-lookahead equilibria is constant in the 
average-case, leaf model. 

Proof. Suppose that we are at an equilibrium. Let Vi* be the i^^ highest valuation, let player i* denote 
the corresponding player, let bi* denote their bid, and Ci* be the click through rate of the slot they 
currently occupy. We recall that Vi denotes the player in slot i and it has click through rate Ci and bid 
6j. The social utility of a set A of players is X^jg^^'jCj. Thus, by the above definitions, the optimal 
social utility is '^^Vi*Ci. 

Now, choose a, /3 < 1 such that (1— q)^ > m/3. Let I be the set of indices i that satisfy both Vi < avi* 
and Ci* < f3ci. Note that for all i ^ / the pair of players Vi,Vi* contribute at least min{a, /3}f j*Cj to 
OPT. So if I is empty, then we have achieved a constant coordination ratio. We may thus suppose / 
is not empty and choose i e I. 

Consider Ci*_i. As we assume "balanced" bidding, 

bi* > (1 — )vi* 

Since bi* < bi < Vi < avi* by assumption, we have Cj*_i < jzi^Ci*. Choose m> 1. We first prove the 
following claim. 

Claim 2.5. For all i & I, we have Cj+i < ^. 

Proof. Suppose q+i > — , for some i £ I. Wc will show that player i* moving into slot i is then 
lookahead improving. Consider his lookahead utility for staying put. Ignoring a repeat move for 
player i*, which occurs with probability ^, player z*'s utility in every other circumstance is at most 
Ci*-iVi*, as other players can improve his position by at most one. On the other hand, if player i* 
moves into slot i then his lookahead utility is at least Ci^i{vi* — bi); he wins at least slot i + 1 and pays 
at most his bid. If player i is chosen to repeat his move then his utility is the same for both cases (as 
he will then simply play a best response move). Thus, it is enough for us to show that 



Ci+i{vi* - bi) > Ci*-iVi* 
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However hi < Vi < avi* and putting this together with the above inequahties gives 

Ci+i{vi*-bi) > — (l-a)ui* 
m 

> CiVi* 

1 — a 

> CiVi* 

1 — a 

1 

> Ci*Vi* 

1 — a 

We are now done, by our choice of a and (3, and have shown that player i* moving into slot i is a 
lookahead improving move. This contradicts the fact we are at an equilibria. □ 

Thus we have established that for all i G I, Cj+i < ^. Thus, we can bound the optimal social 
utility contributed by the slots i G / by T^^Cipfjo* where io = minjg/ i. 

Now if 1 ^ I then we have achieved our constant coordination ratio since then either ciVi > acivi* 
or ci*vi* > /3ciVi*. Hence, we are guaranteed at least min{a, /3}cit;i* > min{a, /Jjcigf j^*, that is, a 
least a constant factor of the social utility from all the slots in / in the optimal allocation. So we 
suppose 1 G I. 

Choose ai = ^^^a and consider the player currently in slot 2. By this choice of ai, we ensure that 
this player does not have value more than aivi* . To see this, recall the player is bidding in a balanced 



manner and so, by Claim 2.5, his bid 62 satisfies 



C2 1 

V2>b2> {I )V2 > (1 )V2 

Cl 771 

On the other hand, as 1 G / we have 

bi = vi < avi* 

Thus, we must have V2 < i^^^z^ctvi* = aiVi* or the second player would win the first slot. 

Now let r be the set of players with value at least aiVi*. Choose some constant 7. If |r| < 777, 
then player l*'s lookahead utility for moving into slot one is at least (1 — 7)(1 — ai)vi*ci. If player 1* 
stays put, ignoring a repeat move for player 1*, which occurs with probability ^, player 7*'s utility in 
every other circumstance is at most 

1 13 

ci._i7;i* < ci*vi* < ciVi* 

1 — a 1 — a 

Since player l*'s utility is the same for both cases when a repeated move occurs and since we can 
choose f3 sufficiently small (i.e, /3 < (1 — 7) (1 — a) (1 — ai)), player 1* will improve by moving into slot 
1 in this case, contradicting the fact that we are at an equilibrium. 

Thus, we may suppose |r| > 777. Let ii = maxjgr^- Then the players in F contribute at least 
777ai7;i*Cj^ to the social utility. Take a constant 6 and suppose that > 6^. Then the players in F 
would contribute at least "ySaiCiVi* . Again, this a constant fraction of social utility that is contributed 
in the optimal allocation by player 1* which, in turn, is a constant factor of the optimal social utility 
of the slots in /. Thus, we would achieve a constant factor of the optimal social utility. 

So we may assume Cjj < Consider player ii. His lookahead utility for staying in place, ignoring 
the case of a repeated move, is at most 

1 16 Id 

Cii-lfii < Ci^Vi^ < ClVi^ < ClVi* 

1 — a 1 — an 1 — an 

We may assume that player vi < (1— e)ai7;i* , for some constant e, otherwise we are done. Therefore, 
if player ii moves to slot 1 then he will earn at least ecivi* provided that player 1 makes the next 
move. This occurs with probability I/77, and so his total lookahead utility, ignoring a repeated move. 
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is at least ^civi*. Thus by choosing 6 < {1 — a)e, it fohows that the coordination ratio is constant in 
the average case model. □ 



3. Industrial Organisation: Cournot Competition 

Next we consider the classical game theoretic topic of duopolistic competition. Economists have 
considered a number of alternative models for market competition [75], prominent amongst them is 
the Cournot model [17J. Our main result here is that the social surplus increases when firms are not 
myopic; surprisingly, social welfare is actually maximized when firms use 2-lookahead. 

The Cournot model assumes players sell identical, nondifferentiated goods, and studies competition 
in terms of quantity (rather than price) . Each player takes turns choosing some quantity of good to 
produce, qi, and pays some marginal cost to produce it, c. The price for the good is then set as a 
function of the quantities produced by both players, P{qi + qj) = {a — qi — qj), for some constant a > c. 
On turn /, each player i makes profit: Ill{qi,qj) = qi{a — qi — qj — c). In this form, the model then 
has only has one equilibrium, called the Cournot equilibrium, where qi = (a — c)/3 for each player. At 
equilibrium, each player make a profit of n*(gi, qj) = qi{l — 2qi). The consumer surplus is 2qf and the 
social surplus is then 2qi{l — qi). 

3.1. Production under Lookahead Search. 

We analyse this game when players apply A;-lookahead search. In industrial settings it is natural to 
assume that payoffs are collected over time (as in a repeated game); thus, we focus upon the path 
model. We define this model inductively. In a A;-step lookahead path model, each player i's utility is 
the sum of his utilities in the current turn and the k — 1 subsequent turns. He models the quantities 
chosen in the subsequent turns as though the player acting during those turns were playing the game 
with a smaller lookahead. More specifically, he assumes that the player acting in the t'th subsequent 
turn chooses their quantity to maximise their utility under ak — t lookahead model. In order to rewrite 
this rigorously, let irj be the contribution to his utility that player i expects on the Ith subsequent 
turn (and ttq be the contribution to his utility that player i expects on his current turn), let nj be 
the contribution to player j's utility that player i expects on the Tth subsequent turn, and let qj 
(respectively, qj) be the quantity that player i expects to choose (respectively, expects his opponent 
to choose) under this model. 

Then in the path model, player i's expected utility function is 11* = J2t=o''^t- Player j's expected 
utility function on player i's turn is II-' = X^to • ^™ ^'-'^ ^° determine the quantities that 
player i expects to be chosen by both players in the subsequent turns and, thereby, determine the 
quantity he chooses this turn and the utility he expects to garner. To facilitate the discussion, it 
should be noted that unless noted otherwise, any reference to a "turn" refers to a turn during player 
z's calculation and not an actual game turn. 

To simplify our analysis, we will define qi to be the quantity chosen on turn / by whichever player 
is acting and 11; to be the expected utility that that player garners from turn / to turn k. So IIq = 11', 
Hi = X^ti ^tc- We define 11; to be the utility garnered from turn I to turn k by the player who 
does not act during turn I. So IIq = , Hi = J2t=i "^h etc. It is clear that on each turn I, the active 
player is trying to maximise 11;. 

We are now ready to compute these quantities and utilities recursively. We may assume that a = 1 
and c = 0. By our definition above, we have that 11^ = ^^(1 — qt — qk-i) and 11^ = (?fc-i(l — Qk — Qk-i)- 
Our definition also gives us the recursive formula ioi I < k that Hi = qi{l — qi — qi-i) + H^+i and 
11/ = qi-i{l — qi — qi-i) + n;_|_i. Note that in each of these formulas, 11; and 11; are each functions of 
qt for t > I; g;_i is in fact fixed on the previous turn and is, therefore, not a variable in 11;. It is now 
possible to calculate qi recursively. 
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Lemma 3.1. The form of qi is Pi — aiqi-i, where (3^ = ak = Pk-i = 5, Ofc-i = \ and, for I < k — 1, 
^' = ^ ' = ~A ^ 



The last term of the above sum is zero, since qL+2 is chosen so that ^^^^^ = 0. Thus, if we plug in 



Proof. We proceed by inducting down from q^. Consider q^ which is the active player's choice on the 
final turn. As it is the final turn, he is acting myopically and so will choose qk so as to maximise 
life = Qk{^ ~ Qk ~ Qk-i)- This parobala as a function of qk is maximised when q^ = ^ 2*^"^ • Doing 
a similar calculation for XI^.i = — qk^i — qk-2) + gives us the desired values for /3fc-i and 

Ok-i- We now assume the lemma for all / > L and try to prove it for qi. Recall the recursive formula 
Hl = qL{l — qL — Ql-i) + n^+i. Taking the derivative of this with respect to qi and setting it all 
equal to zero gives us 

= {l-2q,-q,.,) + {l-2q,-q,^,)-—q,-—q,^2 + ^^ 

:;hosen so that 

the inductive hypothesis into the above equation and simplify, we get 

2 - /3l+i + aL+i(3L+2 - otL+i(3L+2 - aL+iaL+2h+\ = (4 - 2aL+i - a\j^iaL+2)qL - Ql-i 

This gives us the desired result. □ 

Our goal is now to calculate qo as this will tell us the quantity that player i actually chooses 
on his turn. From the above lemma, we can calculate go if we can determine oq and /3o. Using 
numerical methods on the above recursive formula, we see that as /c — )■ 00, ao decreases towards a 
limit of 0.2955977 . . . and Pq approaches a limit of 0.4790699 .... These values also converge quite 
quickly; they both converge to within 0.0001 of the limiting value for k > 10. Thus, at a lookahead 
equilibrium, player i will choose qi ~ .0.4790699 — 0.2955977gj and player j, symmetrically, will choose 
qj ^ 0.4790699 — 0.2955977(;j. So each player will choose a quantity q ^ 0.369767. which is more than 
in the myopic equilibrium. Indeed, it is easy to show that for every k > 2, each player will produce 
more than the myopic equilibrium. This is illustrated in Figure [l} Observe the quantity produced 
does not change monotonically with the length of foresight k, but it does increase significantly if 
non-myopic lookahead is applied at all. Consequently, in the path model looking ahead is better for 
society overall but worse for each individual firm's profitability (as the increase in sales is outweighed 
by the consequent reduction in price). 
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Figure 1. How output varies with foresight k 
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Theorem 3.2. For Cournot games under the path model, output at a k-lookahead equilibrium peaks 
at k = 2 with output 12.5% larger than at a myopic equilibrium {k = 1). As foresight increases, output 
is 10.9% larger in the limit. The associated rises in social surplus are 5.5% and 4.9%, respectively, 

3.2. Stackelberg Behaviour. 

We could also analyse this game under the leaf model, but this model is both less realistic here and 
trivial to analyse. However, it is interesting to note that for the leaf model with asymmetric lookahead, 
where player i has 2-lookahead and player j has 1-lookahead, we get the same equilibrium as the classic 
Stackelberg model for competition. Thus, the use of lookahead search can generate leader-follower 
behaviours. 



Now consider the unsplittable selfish routing game. We show that any 2-lookahead equilibrium has 
a constant coordination ratio. We then show how to derive a similar result for 2-lookahead dynamics. 

For this game we have a directed graph G = {V, E) and a set of n agents. Agent i wants to route 1 
unit of flow from a source Sj to a destination i j . Each agent i chooses an — ti path Pi and these paths 
together generate a flow /. We assume that there is a linear latency function Xe{fe) = CLefe + on 
each edge edge e G E. The total latency of a flow / is denoted l{f) = J2eeE ^e{fe)fe = [o-efe + &e)/e- 
The latency of player i is denoted = X^egp ^e/e + ^e', observe that /(/) = Y^i^u k{f)- For this 
game, we consider 2-lookahead in both the leaf and path models, under the average-case lookahead 
model. 

Recall, in the leaf model, a player z's move from a flow / to a flow /' is lookahead improving if 
E{li{f")\f') > E{li{f")\f) where /" is the flow obtained after the next player (chosen uniformly at 
random amongst all the players) makes a (myopic) best response. In the path model a player i's move 
from a flow / to a flow /' is lookahead improving if \li{f') + ^E{li{f")\f') > \li{f) + \E{f"\f) where 
/" is as above. 

Theorem 4.1. In the average-case 2-lookahead leaf model, the coordination ratio for an equilibrium 
is at most (1 -|- \/5)^. 

Proof. This proof adapts the result in [3] to our setting. Let / be any flow at a lookahead equilibrium 
and /* be an optimal flow. Suppose player i is taking path Pj in flow / and path P^ in flow /*. Let 
J(e) be the set of players using edge e in the flow / and let J*(e) be the same for /*. 

At a lookahead equilibrium, player j doesn't want to move from Pj to P* . This means that after 
a random/worst case next move, the strategy Pj has a higher (expected) payoff than the strategy 
Pj . In particular, it must the case that the best possible outcome resulting from from choosing Pj 
has a higher (expected) payoff than the worst possible outcome resulting from the strategy P* . In 
the former case, the best possible outcome is that the next player had also been using the path Pj 
but then moves completely off the path. Similarly, in the latter case, the worst possible outcome is 
that the next player had not been using any edge on the path P* but then changes strategy and also 
selects the path P* entirely. Thus we must have: 



4. Unsplittable Selfish Routing 



^Oe(/e + 2)+6e > ^Oefe + he 



eeP,:/e>2 
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Summing over all players j, we obtain 

aeife + 2) + be > ^ ^ Ce/e + &e " Yl 
j eeP; j \eePj eeP,:/e>2 

eGEj£j{e) j eeP,:/e>2 



eS-B eGP,;/e>2 
eS-B 



2 / ^ ■ ■^\Je) 



Rearranging gives and applying the Cauchy-Schwartz inequalitjj^ produces 



2 

eSE j eeP* 

= ^(a,(/e + 2) + 6e)/: 

eeE 



V eS-B V e&E eGE 



< jY^^^f'^y jY.^-(fe) + 2Y^e{f:) 

V e&E V eeS ee^; 



Set p = Y a"!/*) ^^"^ observe that is the coordination ratio, given we choose the worst 
lookahead equilibrium /. Consequently, < p + 2. Solving gives p < 1 + \/5 as desired. □ 

Next we consider the lookahead dynamics and study coordination ratio for the lookahead dynamics. 

Theorem 4.2. In the average-case 2-lookahead model, the coordination ratio for lookahead dynamics 
is a constant for the leaf model. 

Proof. We follow a similar approach to Theorem 4.1 in [30j and start by proving some sub-lemmas. 

Lemma 4.3. // player i makes a lookahead improving move from path Pi to P[ which changes the 
flow from f to f[ then k{f'^ < 2k{f) + H{f). 



any two vectors x and y, we have x^y < Vx^x • -v/y^. 
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Proof. So player i's lookahead cost with fl is less than his cost with /. Moreover, we can lower bound 
the lookahead cost of by the quantity 

V ^{aje + be) + (1 - -){ae{fe + 1) + ^e) 

^-^ n n 

= aeife + l) + be- —tte 

eeP' 



> V(l--)ae(/e + l) + 6e 

^-^ n 

^ ^)(ae(/e + l) + 6e 



n 

= (I--)U/;) 

n 

This would be the cost incurred if the randomly selected next player j avoids any edge e that player 
i is on (either by moving away from e or not moving onto e). Using similar reasoning, we may upper 
bound the cost to player i of sticking with Pi by 

V -(ae/e + he) + (1 - -)(ae(/e + 1) + he) 

^-^ n n 



(Oe/e + &e) + (1 - — lOfi 



n 



eePi 

< ^ae{fe + l) + he < ^ 2aefe + he 
eePi eePi 

< 2k{f) 

Here we assumed the next player j selects every edge e that player i is on (either by staying on e or 
by moving onto e). Therefore, li{fl) < 2(1 + :^^)li{f) which implies the statement in the lemma. □ 



Applying Lemma 4.3 with Lemma 4.2 in [301, we get: 

Lemma 4.4. // agent i changes his path from Pi to P-', changing the flow from f to f[, then l{fi) < 
/(/) + [d + l)li{f[) — li{f). In particular, if agent i makes a lookahead improving move then l{f'i) < 
(l + i)/(/) + 3/.(/). 



Now, applying Lemma 4.4 with Lemma 4.3 in |30| . 

Lemma 4.5. Let f he the current flow. Suppose we chose a player at random and they make a 
lookahead best response resulting in flow f . Then E{li{f')\f) < (1 + ^)l{f). 

Finally, we prove the following lemma which will imply the statement of the theorem. 

Lemma 4.6. Let f he the current flow. Suppose we chose a player at random and they make a 
lookahead best response resulting in flow f. Then either E{l{f')\f) < (1 - ^)l{f) or l{f) < (6 + 
VWl)OPT. 

Proof. Suppose player i changes his path from Pj to Pj resulting in the flow changing from / to /j'. 
Thus P(/(/')l/) = ^EiTO- 
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Case 1: E^Mf^) <E^l^{f) 



i 

^ lT.^if) + lw)-kif) 



Case 2: T.^W^>Y.Mf) = Kf) 

Let /* be the optimal flow and let be player z's path in this flow. Let J*{e) be the set of 



players on edge e in /*. Since P- is a lookahead best response, we may apply Lemma 4.3 to see that 
Wd < 2/i(r) + ^lif*). Thus 

/(/) < 4j;/,(/;) < 4 j;2/,(r) + !/(/*) 

i i 

= 12^ /,(/*) = 12 5^^a,/: + 6e 

i i e£E 

< 12X1 ae(/e + l)+6e 

eSE ie J*(e) 

= 12^aefef:bef: + aef: 

< 12v//(/)/(/*) + /(/*) 



where the last inequality follows from Cauchy-Schwartz. Thus, if we set x = y (jp7> the above can 
be transformed into the inequality < 12x + 1. □ 



The remainder of the proof of Theorem 4.2 follows by applying the above lemmas as shown in 



301. □ 



4.1. Valid Utility Games. 

Here is a bad example for the path model (a slightly modified example applies to the leaf model). It 
applies for any number t of lookahead moves. Take a Steiner Set System 5(2, k, n). For example, these 
exist with n = + q + l and k = q + 1. Let each subset in the system induce a " sub-game" - thus each 
pair of players are together in exactly one subgame. Consequently, each player is in = q + I = k 
subgames, and n games in total. The strategy set of a player i in subgame g is {yf , xf^,xf2, ■ ■ ■ , xfj^}. 
It has one nice strategy and k naughty strategies: player i always gets one point for playing the nice 
strategy yf, but gets two points for playing a naughty strategy xf^^ provided Ij = i mod k, where 
the sum is over all players j who are playing a strategy x^- , - we call i the winner of subgame g in 
the case. 

Thus a player i who moves next can guarantee k points by playing ys but can guarantee 2k points 
by playing x^s to win all k subgames it is in. Moreover, the player can lose at most one game in 
each subsequent time period. This follows as the next t = k players share exactly one game each with 
player i. Thus the player, in the worst case receives 2k + 2{k — l) + -- -+ 4 + 2 = k{k + 1) in the next 
k moves. This is greater than the k"^ payoff from playing only ys. 
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Consider then the dynamics of this game under fe-lookahead search. Over time, at any state of 
play, the total value of the game will be 2n; in each of the n subgames all the players are behaving 
naughtily. The optimal value however is n{k + 1); in each subgame. A; — 1 of the players are nice and 
one is naiighty. So wc have shown: 

Lemma 4.7. For valid utility games, in the path model the coordination ratio of k-lookahead dynamics 
is at least ^ = ^ > l^/n. □ 

4.2. Basic Utility Games. 

For basic utility games, good guarantees can be obtained for the path model. More interestingly, for 
the leaf model lookahead equilibria can be extremely bad, even for 2-lookahead equilibria. 

Lemma 4.8. In basic utility games, the coordination ratio of 2-lookahead equilibria can be arbitrarily 

bad in the leaf model. 

Proof. Consider the following symmetric 2-player game. Let each player have a groundsct {B,T,G}. 
A feasible strategy consists of playing at most one action in the groundset. We create a submodular 
social function using the table 








B 


T 


G 


B 


6 


6 


6 


1 


T 


K-9 


K-9 


7 


4 


G 


K-5 


K-10 


8 


5 



Set 7(0,0) = 0. Then let the ijth entry of the matrix, 6ij, be the marginal value of adding action i 
when action j is being played by the other player. For example, j{B, 0) = 7(0, 0) + Ssfi = + 6 = 6. 
Similarly, 7(5, S) = 12, 7(T, 0) = k - 9, j{G, 0) = k - 5, 7(5, G) = k - 4, 7(5, T) = k '- 3, j(T, T) = 
K-2, jIt, G) = k-1, 7(G, G) = K. 

We need to extend this definition to all subsets. Suppose that Player 1 is currently choosing Si and 
Player 2 is currently choosing 82- To complete the definition of 7, we say that the marginal value of 
adding action i to the subset S* = 5*1 U ^2, is 5i^s = ™i%e5iU52 ^ij- 

Note that this is true if i is added to and if i is added to ^2. This processes produces a submodular 
social function. The payoff functions are then defined in accordance with the Vickrey condition. 

Clearly, as the players are constrained to play singleton actions, the optimal solution Q = {G, G} 
has value k. We claim that {B,B}, with social value 12, is the only equilibrium in the leaf model. 
Thus, for any k, we can be a factor away from the optimal social value. 

To prove this, first suppose that Player 1 plays B. According to the Vickrey condition, the best 
response of Player 2 is to play T (she needs to choose * maximize 7(-B, *)). The payoff to player 1 is 
then 7(5, T) — 7(0, T) = (k — 3) — (k — 9) = 6. Second suppose that Player 1 plays T. According 
to the Vickrey condition, the best response of Player 2 is to play G (she needs to maximize 7(r, *)). 
The payoff to player 1 is then j{T,G) — 7(0, G) = (k — 1) — (k — 5) = 4. Finally suppose that 
Player 1 plays G. According to the Vickrey condition, the best response of Player 2 is to play G 
- observe this must be the case as {G, G) is the optimal solution. The payoff to player 1 is then 
7(G, G) - 7(0, G) = K - (« - 5) = 5. 

Thus, with 2-lookahead, Player 1 will always think it in his interest to play B. (Note that in the 
leaf model, it is irrelevant for Player 1 what strategy Player 2 is currently playing.) By a symmetric 
argument. Player 2 will always think it in her interest to play B. □ 

5. Shapley Network Design Games 

For our final example we show that the use of lookahead search may allow for "uncoordinated" 
cooperative behaviours. By looking ahead, a player may select a cooperative move whose consequence 
can be to induce other players to also make cooperative moves. We give a very simple illustration 
of this behaviour. Consider the following Shapley network design game: Given a network, there is a 
single source s and a single sink t. We have n players, each wanting to route from s to t. There are A'^ 
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paths (where N may be exponential) to choose from. The cost of any hnk is equahy shared between 
those players that use it. The coordination ratio is then easily seen to be at least n. However, the 
coordination ratio improves by a factor k, when the players use A;-lookahead search. 

Theorem 5.1. The coordination ratio of k-lookahead dynamics for Shapley network design games in 

the leaf m,odel is at most n/k. 

Proof. We present the proof for the worst-case lookahead model. The proof for the average-case model 
uses the same idea. Assume the players are currently choosing the paths {Pi, P2, . . . , Pn}- Consider 
the depth k tree when the players move in the order 1,2, ... ,k. Take a decision node for player k — 1. 
This has N children that are decision nodes for player k. Let the paths chosen by player k at these 
nodes be Qi, Q2, . . ■ , Qn, respectively. Suppose that in response to this move, player k — 1 chooses 
the path Pj. We claim that Pj = Qj. 

<p,r) = E ^+ E 1^ 

+ y -^^+ y 



Thus 




Now since c(Qj , T) < c(P, T) we have 




= c(Q,,r') 

This proves the claim. Applying induction, we see that each player l,...,k will play the same 
strategy P* , and thus, receive the same payoff. Let's take the worst case choice for players 2,. . .,k 
from the point of view of player 1. If P* = P^^, the shortest s — t path, then each of the k chosen 
players will have a cost of at most 
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c(^pSP^ OPT 

k - ~k~ 

Thus, if P* / P^'^^, then player 1 can guarantee himself a cost of at most This argument 

applies for all players so, in an equilibrium, the total cost is at most, |opt. □ 
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